{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sT8AyHRMNh41"
   },
   "source": [
    "# Working with TensorFlow Recommenders: Quickstart\n",
    "\n",
    "## Learning Objectives\n",
    "\n",
    "1. Read the data and build vocabularies.\n",
    "2. Define a model.\n",
    "3. Create and train a model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8f-reQ11gbLB"
   },
   "source": [
    "## Introduction\n",
    "\n",
    "In this tutorial, you build a simple matrix factorization model using the [MovieLens 100K dataset](https://grouplens.org/datasets/movielens/100k/) with TFRS. You can use this model to recommend movies for a given user.\n",
    "\n",
    "Each learning objective will correspond to a __#TODO__ in the [student lab notebook](../labs/quickstart.ipynb) -- try to complete that notebook first before reviewing this solution notebook. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qA00wBE2Ntdm"
   },
   "source": [
    "### Import TFRS\n",
    "\n",
    "First, install and import TFRS:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "6yzAaM85Z12D"
   },
   "outputs": [],
   "source": [
    "# Install required packages.\n",
    "!pip install -q tensorflow-recommenders\n",
    "!pip install -q --upgrade tensorflow-datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<strong>Note:</strong> Please ignore any incompatibility warnings and errors and re-run the above cell before proceeding.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "n3oYt3R6Nr9l"
   },
   "outputs": [],
   "source": [
    "# Import necessary libraries.\n",
    "from typing import Dict, Text\n",
    "\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
    "import tensorflow_datasets as tfds\n",
    "import tensorflow_recommenders as tfrs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zCxQ1CZcO2wh"
   },
   "source": [
    "### Read the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "M-mxBYjdO5m7"
   },
   "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "\u001b[1mDownloading and preparing dataset 4.70 MiB (download: 4.70 MiB, generated: 32.41 MiB, total: 37.10 MiB) to /home/jupyter/tensorflow_datasets/movielens/100k-ratings/0.1.0...\u001b[0m\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "bda2dda7cdef41d6bc93fc618cd01649",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Dl Completed...: 0 url [00:00, ? url/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "fa088d74169240e0a724732b9139ea11",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Dl Size...: 0 MiB [00:00, ? MiB/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "b7fff5cf37ad41eb8e656f7674b977fa",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Extraction completed...: 0 file [00:00, ? file/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Generating train examples...:   0%|          | 0/100000 [00:00<?, ? examples/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Shuffling movielens-train.tfrecord...:   0%|          | 0/100000 [00:00<?, ? examples/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "\u001b[1mDataset movielens downloaded and prepared to /home/jupyter/tensorflow_datasets/movielens/100k-ratings/0.1.0. Subsequent calls will reuse this data.\u001b[0m\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
       "2021-10-28 12:53:26.758781: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "\u001b[1mDownloading and preparing dataset 4.70 MiB (download: 4.70 MiB, generated: 150.35 KiB, total: 4.84 MiB) to /home/jupyter/tensorflow_datasets/movielens/100k-movies/0.1.0...\u001b[0m\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "eb85abdb265d4170ac0536dcd0eda77d",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Dl Completed...: 0 url [00:00, ? url/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "d902d5df68aa46e98d8b7027b59b9574",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Dl Size...: 0 MiB [00:00, ? MiB/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "82824749137b464bbacf956e109ae935",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Extraction completed...: 0 file [00:00, ? file/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Generating train examples...:   0%|          | 0/1682 [00:00<?, ? examples/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
        "model_id": "",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
        "Shuffling movielens-train.tfrecord...:   0%|          | 0/1682 [00:00<?, ? examples/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "\u001b[1mDataset movielens downloaded and prepared to /home/jupyter/tensorflow_datasets/movielens/100k-movies/0.1.0. Subsequent calls will reuse this data.\u001b[0m\n"
      ]
     }
   ],
   "source": [
    "# TODO 1\n",
    "# Ratings data.\n",
    "ratings = tfds.load('movielens/100k-ratings', split=\"train\")\n",
    "# Features of all the available movies.\n",
    "movies = tfds.load('movielens/100k-movies', split=\"train\")\n",
    "\n",
    "# Select the basic features.\n",
    "ratings = ratings.map(lambda x: {\n",
    "    \"movie_title\": x[\"movie_title\"],\n",
    "    \"user_id\": x[\"user_id\"]\n",
    "})\n",
    "movies = movies.map(lambda x: x[\"movie_title\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5W0HSfmSNCWm"
   },
   "source": [
    "Build vocabularies to convert user ids and movie titles into integer indices for embedding layers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "9I1VTEjHzpfX"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2021-10-27 11:13:41.722881: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)\n"
     ]
    }
   ],
   "source": [
    "# Build the vocabularies.\n",
    "user_ids_vocabulary = tf.keras.layers.StringLookup(mask_token=None)\n",
    "user_ids_vocabulary.adapt(ratings.map(lambda x: x[\"user_id\"]))\n",
    "\n",
    "movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)\n",
    "movie_titles_vocabulary.adapt(movies)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Lrch6rVBOB9Q"
   },
   "source": [
    "### Define a model\n",
    "\n",
    "We can define a TFRS model by inheriting from `tfrs.Model` and implementing the `compute_loss` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "e5dNbDZwOIHR"
   },
   "outputs": [],
   "source": [
    "class MovieLensModel(tfrs.Model):\n",
    "  # We derive from a custom base class to help reduce boilerplate. Under the hood,\n",
    "  # these are still plain Keras Models.\n",
    "\n",
    "  def __init__(\n",
    "      self,\n",
    "      user_model: tf.keras.Model,\n",
    "      movie_model: tf.keras.Model,\n",
    "      task: tfrs.tasks.Retrieval):\n",
    "    super().__init__()\n",
    "\n",
    "    # Set up user and movie representations.\n",
    "    self.user_model = user_model\n",
    "    self.movie_model = movie_model\n",
    "\n",
    "    # Set up a retrieval task.\n",
    "    self.task = task\n",
    "\n",
    "  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:\n",
    "    # Define how the loss is computed.\n",
    "\n",
    "    user_embeddings = self.user_model(features[\"user_id\"])\n",
    "    movie_embeddings = self.movie_model(features[\"movie_title\"])\n",
    "\n",
    "    return self.task(user_embeddings, movie_embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wdwtgUCEOI8y"
   },
   "source": [
    "Define the two models and the retrieval task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "EvtnUN6aUY4U"
   },
   "outputs": [],
   "source": [
    "# TODO 2\n",
    "# Define user and movie models.\n",
    "user_model = tf.keras.Sequential([\n",
    "    user_ids_vocabulary,\n",
    "    tf.keras.layers.Embedding(user_ids_vocabulary.vocabulary_size(), 64)\n",
    "])\n",
    "movie_model = tf.keras.Sequential([\n",
    "    movie_titles_vocabulary,\n",
    "    tf.keras.layers.Embedding(movie_titles_vocabulary.vocabulary_size(), 64)\n",
    "])\n",
    "\n",
    "# Define your objectives.\n",
    "task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(\n",
    "    movies.batch(128).map(movie_model)\n",
    "  )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BMV0HpzmJGWk"
   },
   "source": [
    "\n",
    "### Fit and evaluate it.\n",
    "\n",
    "Create the model, train it, and generate predictions:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "H2tQDhqkOKf1"
   },
   "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Epoch 1/3\n",
       "25/25 [==============================] - 33s 1s/step - factorized_top_k/top_1_categorical_accuracy: 1.0000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0016 - factorized_top_k/top_10_categorical_accuracy: 0.0047 - factorized_top_k/top_50_categorical_accuracy: 0.0438 - factorized_top_k/top_100_categorical_accuracy: 0.0988 - loss: 33083.5114 - regularization_loss: 0.0000e+00 - total_loss: 33083.5114\n",
       "Epoch 2/3\n",
       "25/25 [==============================] - 32s 1s/step - factorized_top_k/top_1_categorical_accuracy: 1.2000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0047 - factorized_top_k/top_10_categorical_accuracy: 0.0143 - factorized_top_k/top_50_categorical_accuracy: 0.1051 - factorized_top_k/top_100_categorical_accuracy: 0.2094 - loss: 31010.0450 - regularization_loss: 0.0000e+00 - total_loss: 31010.0450\n",
       "Epoch 3/3\n",
       "25/25 [==============================] - 31s 1s/step - factorized_top_k/top_1_categorical_accuracy: 4.5000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0082 - factorized_top_k/top_10_categorical_accuracy: 0.0219 - factorized_top_k/top_50_categorical_accuracy: 0.1429 - factorized_top_k/top_100_categorical_accuracy: 0.2675 - loss: 30418.8558 - regularization_loss: 0.0000e+00 - total_loss: 30418.8558\n",
       "Top 3 recommendations for user 42: [b'Rent-a-Kid (1995)' b'Just Cause (1995)' b'House Arrest (1996)']\n"
      ]
     }
   ],
   "source": [
    "# TODO 3\n",
    "# Create a retrieval model.\n",
    "model = MovieLensModel(user_model, movie_model, task)\n",
    "model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))\n",
    "\n",
    "# Train for 3 epochs.\n",
    "model.fit(ratings.batch(4096), epochs=3)\n",
    "\n",
    "# Use brute-force search to set up retrieval using the trained representations.\n",
    "index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)\n",
    "index.index_from_dataset(\n",
    "    movies.batch(100).map(lambda title: (title, model.movie_model(title))))\n",
    "\n",
    "# Get some recommendations.\n",
    "_, titles = index(np.array([\"42\"]))\n",
    "print(f\"Top 3 recommendations for user 42: {titles[0, :3]}\")"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "quickstart.ipynb",
   "private_outputs": true,
   "provenance": [],
   "toc_visible": true
  },
  "environment": {
   "kernel": "python3",
   "name": "tf2-gpu.2-6.m82",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-6:m82"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
